category: main
step: 2_staging
sub_step: 2_combine
doc_status: ready
language: eng
macro combine
Name | Category | In Main Macro | Doc Status |
---|---|---|---|
custom_union_relations | auxiliary | combine, create_dataset | ready |
get_relations_by_re | auxiliary | normalize, combine | ready |
The combine
macro is designed to combine data for each pipeline.
The name of the dbt model (=the name of the sql file in the models folder) must match the template:
combine_{pipeline_name}
.
For example, combine_events
.
A macro is called inside this file:
{{ datacraft.combine() }}
Above the macro call, the data dependency will be specified in the file via —depends_on
. That is, the entire contents of the file looks, for example, like this:
-- depends_on: {{ ref('join_appmetrica_events') }}
-- depends_on: {{ ref('join_ym_events') }}
{{ datacraft.combine() }}
This macro accepts the following arguments:
params
(default: none)disable_incremental
(default: none)override_target_model_name
(default: none)date_from
(default: none)date_to
(default: none)limit0
(default: none)First, the macro considers the name of the model from the passed argument (
override_target_model_name
), or from the file name (this.name
). When using the override_target_model_name
argument, the macro works as if it were in a model with a name equal to the value override_target_model_name
.
The name of the model, obtained in one way or another, is divided into parts by the underscore. For example, the name combine_events
will be split into 2 parts, and the macro will take over from these parts:
pipeline_name
→ eventsThe models of the combine
step can have names of 2 or 3 parts, the second part is always a pipeline.
If the model belongs to the pipeline registry
, then it still has a link. For example, for the combine_registry_appprofilematching
model there is:
link_name
→ appprofilematchingIf the model name does not match the template, the macro will return an error.
Next, the macro will search for the pattern of the corresponding tables from the previous step (join
).
The pattern for the pipeline registry
is:
'join' ~ '_[^_]+_' ~ pipeline_name ~ '_' ~ link_name
For all other data, the pattern is:
'join' ~ '_[^_]+_' ~ pipeline_name
The macro finds all the relevant tables thanks to the work of another macro - get_relations_by_re
.
If no data is found, the macro will return an error.
Then the received data will be automatically combined (UNION ALL
) using the macro custom_union_relations
. This will form the source table - source_table
.
If the data belongs to the pipelines datestat
or events
, then the materialization of the table will be incremental:
{{ config(
materialized='incremental',
order_by=('__date', '__table_name'),
incremental_strategy='delete+insert',
unique_key=['__date', '__table_name'],
on_schema_change='fail'
) }}
For other pipelines, the materialization will be different:
{{ config(
materialized='table',
order_by=('__table_name'),
on_schema_change='fail'
) }}
In an automatically generated SQL query, all columns from the previously created source_table
will be selected in the SELECT block. The table_name
column will be wrapped in LowCardinality
to improve performance.
If the limit0
argument is enabled, LIMIT 0
will be added at the end of the SQL query.
A file in sql format in the models folder. File name:
combine_events
File Contents:
-- depends_on: {{ ref('join_appmetrica_events') }}
-- depends_on: {{ ref('join_ym_events') }}
{{ datacraft.combine() }}
This is the fourth of the main macros.